Spotify is one of the larger music streaming services available today with 345 million active users 1. Instead of having to buy cd’s or download every song you listen to, Spotify allows access to millions of songs. Because Spotify is a centralized platform, it provides us with an opportunity to ask questions about what makes songs successful.
In order to investigate what makes some songs more successful than others, we will look at if certain features have a strong correlation with other features. In addition, we want to discover the most popular genre. Our data will specify a few genres that may be the most popular. Certain features will be strongly correlated to other features.
The data we are using is based on Spotify data from 1921 to 2020 including over 175,000 audio tracks.We found our data on Kaggle 2. This dataset groups the data by artist, genre, and year. There are nine different variables measured in the dataset. They are acousticness, danceability, duration, energy, liveness, instrumentalness, loudness, speechiness, valence, popularity, and tempo.
Energy (en) is a perceptual measure of the intensity and activity of a track on a scale from 0.0 to 1.0. Some of the perceptual features that are included in this are dynamic range, perceived loudness, timbre, onset rate, and general entropy. Liveness (li) ranges from 0 to 1 and detects if an audience is present in a recording. If the liveness value is above 0.8, there is a strong likelihood that the track is live. Acousticness (ac) is the confidence measure of the track being acoustic. It varies from 0.0 to 1.0, with 1.0 representing high confidence that the track is acoustic. Loudness (lo) ranges from -60 to 0 and is measured in decibels (dB). It suggests the overall loudless averaged over the entire track. The measure of danceability (db) includes a combination of tempo, rhythm stability, beat strength and regularity. It rates how suitable a track is for dancing from 0.0 to 1.0 with 1 being the most danceable. Duration (dur) measures the length of the track in milliseconds (ms). The instrumentalness (ins) feature tracks whether a song contains vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are considered vocal. Instrumentalness ranges from 0 to 1.0 with 1.0 being the most instrumental. Speechiness (sp) is the opposite of instrumentalness, measuring the relative length of the track containing any kind of human voice. The tempo (tmp) feature gives information on the tempo of the track in Beat Per Minute (BPM). Valence (val) measures the positiveness of the track, higher valence relates to more cheerful and upbeat songs. Lastly, popularity (pop) is calculated by an algorithm that is based on the total number of plays the track has had and how recent those plays are.
In the rest of our report, we will first group the genres into broader categories and then analyze the features throughout the genres. We will also compare features to each other and test the correlations between two features to see if they have a strong linear relationship or not. Lastly, we will discover which genres are the most popular by using t-tests comparing the genre popularity means. In the end, we will show how popularity is related to different genres, as well and how different features relate to each other.
There were 3232 genres. We condensed these into the top 20 occurring terms in these genres using regular expressions and counting the occurrences.
Here are all the original genres. As you can see there are thousands of them, and most of them are obscure. Some seem to not even make sense (like the genre “[]”).
This is the top 100 terms found from all the genres. These terms will be used to create the simplified genres. Note that some of the original genres are double counted, such as original genre african rock being in the simplified genres african and rock.
These are the top 20 terms found. They are the ones we will use. Note this only uses 60.7% of the data since 39.3% of the data do not fall under these top 20 categories.
We use these top 20 to create a more concisely labeled dataset. We also include the label other to account for the other 1743 genres with their occurrences less than or equal to 39.
This graph shows the number of occurrances of each simplified genre.
This graph shows the number of occurrances of each simplified genre without the “other” category (so the scale is slighly easier to read).
The second question we want to answer is to see if any features have strong linear correlations to other features. To do this, we use r-values and their corresponding graphs.
First, we found r-value between combinations of all the features, which is shown in the table below.
The raw r-values between every combination of features.
The r-values in an easier to view format. The red and blue show signify higher correlations.
As the table above shows, some features seem to have strong linear relationships, while some features seem to not have a strong linear relationship. To isolate those, we filtered for the absolute value of r-values only over .9 to find the strongest feature relations. We chose .9 as a threshold arbitrarily since there were many features that correlated. You can see some other thresholds below.
0.9There are 3 r-values above this threshold.
0.8There are 10 r-values above this threshold.
0.7There are 11 r-values above this threshold.
0.6There are 18 r-values above this threshold.
noneThere are 55 r-values above this threshold.
In the below graph and table we can again see that there are strong correlations between energy and the other features acousticness, loudness, and tempo. However, it is also interesting to note that those same three features that we found correlate strongly with energy also correlate with each other, although to a lesser degree.
It is hard to say what this means exactly, but it does suggest a few possibilities, and speak to the difference between correlation and causation. For example, there is a r-value of -0.8715355 between tempo and loudness. However, since we know that both those features correlate even stronger with energy, it may be possible that what is more significant is their relation to energy. This shows that these features are all highly related, and the fact that they all also correlate highly with each other suggests these features all measure for something similar.
Plotted feature vs feature.
The raw r-values.
We created density plots to get an initial idea of the genres relations to each feature. These are messy so in the next section we will try to make sense of them. In particular, we will analyze the popularity feature as it relates to each genre.
### Assumptions
## [1] "m_ac"
## [1] "m_da"
## [1] "m_du"
## [1] "m_en"
## [1] "m_in"
## [1] "m_li"
## [1] "m_lo"
## [1] "m_sp"
## [1] "m_te"
## [1] "m_va"
## [1] "m_po"
We ran t-tests to find differences between genres in the different features. The t-test statistic3 is as follows:
\[ t = \frac{m_a - m_b}{\sqrt{\frac{s_a^2}{n_a}+\frac{s_b^2}{n_b}}} \]
We use this test statistic to calculate the p-value by finding the corresponding quantile from the student t distribution with \(\max(n_a, n_b)-1\) degrees of freedom. While we will focus on analyzing popularity in particular in the next section, we do this test between every combination of genres for all features:
All t-test between genres for acousticness.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for danceability.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for duration_ms.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for energy.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for instrumentalness.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for liveness.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for loudness.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for speechiness.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for tempo.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for valence.
Filtered for only significant differences (p-value < 0.5).
All t-test between genres for popularity.
Filtered for only significant differences (p-value < 0.5).
We decided to further examine the popularity of the genres in depth to see if we could discover the most popular genre(s).
We made a graph to compare the box plots of all the genres with their popularity in order to get an overview of the distributions before jumping into our t tests. Overall, the boxplots show that rap has the highest mean popularity for all of the genres. Next, we will use t-tests to evaluate if the difference in the means of the genres is significant enough for us to conclude that rap has the highest mean popularity.
Additionally, we made a density plot of our estimator, the mean popularity, to check that it looks normally distributed. We did this so we know using a t-test is appropriate. As you can see, the plot seem normally distributed so a t-test is appropriate.
We found the most popular by finding the genre with the highest mean popularity. Here you can see the genre with the highest mean popularity was rap.
We then ran the t-test for the feature popularity between all genres.
We then isolated the genres that didn’t have p-value < 0.5 and therefore cannot be dismissed as not also as popular as rap.
To put these results back into context, we show the mean and standard deviation of popularity from these genres. As you can see, their means were very similar, so it makes sense that their p-values were not significant.
The following graph shows the mean popularity for hip, rap, and swedish.
If we were to determine the most popular we would need to observe a significant difference between all other values and the top popularity genre mean. From our t-tests to test the genre popularity means, we were not able to come to a definite conclusion on which genre is the most popular. Because all of the p-values are not below .05, we do not have statistical evidence to reject the null that the rap mean is significantly higher than the rest of the genres. Hip and swedish have p-values above .05, so they could all still be the most popular. However, for the genres with p-values below .05, we do have statistical significance evidence that they are not the most popular. This information could be useful for someone trying to create a song because it will give them input into which genres are most popular with listeners. By creating a song in a more popular genre, people may be more likely to listen to the song, which can generate more revenue for the artist.
We also discovered which features have the strongest linear correlations to each other, vs which features have no linear relationship. We found that energy has a correlation over the absolute value of 0.9 to three other features, acousticness, loudness and tempo. Acousticness has a negative correlation with energy while loudness and tempo both have positive correlations with energy. Considering that acousticness, loudness, and tempo are all measured based on set measurements, while energy is calculated from intensity and activity in the song, we can infer that acousticness, loudness, and tempo all affect the energy of a song. This finding matters because when a producer is trying to make different characteristics of a song come together perfectly, the correlations between specific features may help them adjust said features in order to compliment each other better.
A short-coming of our analysis is that we do not know how many songs are included in the data for each genre. Some genre’s data may be based on more songs than other genres. In addition, because we only filtered the top 20 highest strings to group genres, some of the genres are not included in our analysis.
Future work on this dataset could involve testing out more of the features relationships and seeing if they have strong models. We could also look for datasets from other music streaming services, such as Apple Music and Pandora.